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ii. | Abstract. This paper describes a greedy Z\-approximation algorithm for MONOTONE COVERING, a generalization 

of many fundamental NP-hard covering problems. The approximation ratio A is the maximum number of variables 

fvj ' on which any constraint depends. (For example, for vertex cover, A is 2.) The algorithm unifies, generalizes, and 

improves many previous algorithms for fundamental covering problems such as vertex cover, set cover, facilities 
location, and integer and mixed-integer covering linear programs with upper bound on the variables. 
The algorithm is also the first Z\-competitive algorithm for online monotone covering, which generalizes online ver- 
sions of the above-mentioned covering problems as well as many fundamental online paging and caching problems. 
As such it also generalizes many classical online algorithms, including LRU, FIFO, FWF, BALANCE, GREEDY-DUAL, 
GREEDY-DUAL SIZE (a.k.a. LANDLORD), and algorithms for connection caching, where A is the cache size. It also 
gives new Z\-competitive algorithms for upgradable variants of these problems, which model choosing the caching 

^0 ' strategy and an appropriate hardware configuration (cache size, CPU, bus, network, etc.). 

q 

O , 1 Introduction 

(f) . The classification of general techniques is an important research program within the field of approximation algorithms. 

£> ' What are the scopes of, and the relationships between, the various algorithm-design techniques such as the primal-dual 

method, the local-ratio method |5|, and randomized rounding? Within this research program, an important question is 

which problems admit optimal and fast greedy approximation algorithms, and by what techniques 12511 111 ? 
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We give here a single online greedy /^-approximation algorithm for a combinatorially rich class of monotone 
covering problems, including many classical covering problems as well as online paging and caching problems. The 
approximation ratio, A, is the maximum number of variables on which any constraint depends. (For vertex cover, 
A = 2.) 

For some problems in the class, no greedy (or other) zi-approximation algorithms were known. For others, previ- 
ous greedy Z\-approximation algorithms were known, but with non-trivial and seemingly problem-specific analyses. 
For vertex cover and set cover, in the early 1980's, Hochbaum gave an algorithm that rounds a solution to the 
standard LP relaxation |33l ; Bar- Yehuda and Even gave a linear-time greedy algorithm (6). A few years later, for set 
multicover, Hall and Hochbaum gave a quadratic-time primal-dual algorithm J26l . In the late 1990's, Bertsimas and 
Vohra generalized all of these results with a quadratic-time primal-dual algorithm for covering integer programs (CIP), 
restricted to {0, l}-variables and integer constraint matrix A, and with approximation ratio max^ ^ Aij > A iflOl . 
Most recently, in 2000, Carr et al. gave the first (and only previous) Zi-approximation for general CIP with {0, 1} 
variables 1 15 |jjj They state (without details) that their result extends to allow general upper bounds on the variables 
(restricting Xj £ {0, 1, 2, ... , %}). In 2009 (independently of this work), |46l gives details of an extension to CIP 
with general upper bounds on the variables. Both f 15) and (46) use exponentially many valid "Knapsack Cover" (KC) 
inequalities to reduce the integrality gap to A, Their algorithms solve the LP using the ellipsoid method, so the running 
time is a high-degree polynomial. 

Online paging and caching algorithms are also (online) monotone covering problems, as they can be formulated as 
online set cover Q. These problems also have a rich history (see Fig.Q] and lfl2) ). 

All of the classical covering problems above (vertex cover, set cover, mixed integer linear programs with variable 
upper bounds (CMIP) and others (facility location, probabilistic variants of these problems, etc.), as well as online 
variants (paging, weighted caching, file caching, (generalized) connection caching, etc.) are special cases of what we 



* Partially supported by NSF awards CNS-0626912, CCF-0729071. 

1 The standard LP relaxation has an arbitrarily large integrality gap (e.g. min{a;i : lCtei + 10x2 > 11; £2 < 1} has gap 10). 
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Fig. 1. Some Zi-approximation covering algorithms and deterministic online algorithms. "*" 
ened here. 
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call monotone covering. Formally, a monotone covering instance is specified by a collection C C 2 R + of constraints 
and a non-negative, non-decreasing, submodulaiQ objective function, c : IR™ — » IR + . The problem is to compute 
min{c(.T) : x G R™ (VS G C) x G S}. Each constraint S G C must be monotone (i.e., closed upwards), but can be 
non-convex. 

Monotone covering allows each variable to take values throughout IR + , but can still model problems with restricted 
variable domains. For example, formulate vertex cover as niin{^^ c v x v : x G IR + , (V(u, w) G E) \x u \ + \x w \ > 1}. 
Given any 2-approximate solution x to this formulation (which allows x u G IR + ), rounding each x u down to its floor 
gives a 2-approximate integer solution. Generally, to model problems where each variable Xj should take values in 
some closed set Uj C IR+ (e.g. Uj = {0, 1} or Uj = Z + ), one allows x G IR" but replaces each monotone constraint 
x G S by the monotone constraint x G fi~ 1 (S), where /U _1 (S') = {x : fi(x) G S} and /ij(x) = max{z G Uj, z < 
Xj}. If x G R? is any Z\-approximate solution to the modified problem, then /i(ir) will be a Zi-approximate solution 
respecting the variable domains. (For vertex cover each Uj — Z + so /ij (x) = [xj\ . jj 



Section [2] describes our greedy /^-approximation algorithm ( Alg. l\ for monotone covering. It is roughly the fol- 
lowing: consider the constraints in any order; to satisfy a constraint, raise each variable in the constraint continuously 
and simultaneously, at rate inversely proportional to its cost. At termination, round x down to fi(x) if appropriate. 

The proof of the approximation ratio is relatively simple: with each step, the cost incurred by the algorithm is at 
most A times the reduction in the residual cost — the minimum possible cost to augment the current x to feasibility. 
The algorithm is online (as described below), and admits distributed implementations (see ll39l ). 

The running time depends on the implementation, which is problem specific, but can be fast. Section |2] describes 
linear-time implementations for vertex cover, set cover, and (non-metric) facility location. Section|3]describes a nearly 
linear-time implementation for covering mixed integer linear programs with variable upper bounds (CMIP). (In con- 
trast, the only previous /^-approximation algorithm (for CIP, a slight restriction of CMIP) uses the ellipsoid method; 
its running time is a high-degree polynomial lfT31 .) Section |4] describes an extension to a probabilistic (two-stage) 
variant of monotone covering, which naturally has submodular cost. The implementation for this case takes time 



2 Formally, c{x) + c(y) > c(x Ay) + c(x\/ y), where x A y (and x V y) are the component- wise minimum (and maximum) of x and 
y. Intuitively, there is no positive synergy between the variables: let djc(x) denote the rate at which increasing Xj would increase 
c(x); then, increasing Xi (for i ^ j) does not increase djc(x). Any separable function c(x) — ^ . Cj(xj) is submodular, the 
product c(x) = Y[ Xj is not. The maximum c(x) — maxj Xj is submodular, the minimum c(x) = min, Xj is not. 
In this setting, if the cost is defined only on the restricted domain, it should be extended to R™ for the algorithm. One way is to 
take the cost of x G R+ to be the expected cost of x, where Xj is rounded up or down to its nearest elements a, & in Uj such that 
b with probability b _^ , otherwise take Xj = a. If a or b doesn't exist, let Xj be the one that does. 



0(N A log A), where N is the number of non-zeros in the constraint matrix and A is the maximum number of con- 
straints in which any variable appears. (For comparison, l30l gives a ln(n)-approximation algorithm for the special 
case of probabilistic set cover; the algorithm is based on submodular- function minimization |45ll , resulting in high- 
degree-polynomial run-timeo 

Section [5] discusses online monotone covering. Following |13], an online algorithm must maintain a current x\ 
as constraints S 6 C are revealed one by one, the algorithm must increase coordinates of x to satisfy x G S. The 
algorithm can't decrease coordinates of x. An algorithm is Z\-competitive if c(x) is at most A times the minimum cost 
of any solution x* that meets all the constraints. 



The greedy algorithm (Alg. 1 1 is an online algorithm. Thus, it gives Z\-competitive algorithms for online ver- 
sions of all of the covering problems mentioned above. It also generalizes many classical deterministic online algo- 
rithms for paging and caching, including LRU, FIFO, FWF for paging IJ481 , Balance and Greedy Dual for weighted 
caching 1 16 52) . Landlord ll53l , a.k.a. Greedy Dual Size lfl4) , for file caching, and algorithms for connection caching 
IU8I19I20IH . The competitive ratio A is the cache size, commonly denoted k, or, in the case of file caching, the maxi- 
mum number of files ever held in cache — at most k or k + 1, depending on the specification. This is the best possible 
competitive ratio for deterministic online algorithms for these problems. 

Section[5]also illustrates the generality of online monotone covering by describing a (k + d)-competitive algorithm 
for a new class of upgradable caching problems. In upgradable caching, the online algorithm chooses not only which 
pages to evict, but also how to configure and upgrade the relevant hardware components (determining such parameters 
as the cache size, CPU, bus, and network speeds, etc.) In the competitive ratio, d is the number of configurable 
hardware parameters. We know of no previous results for upgradable caching, although the classical online rent-or- 
buy (a.k.a. ski rental) problem J36l and its "multislope" generalization |41| have the basic characteristic (paying a 
fixed cost now can reduce many later costs; these are special cases of online monotone covering with A = 2). 

Section [6] describes a natural randomized generalization of Alg. 1 with more flexibility in incrementing the vari- 



ables. This yields a stateless online algorithm, generalizing the Harmonic fc-server algorithm (as it specializes for 
paging and weighted caching [47]) and Pitt's weighted vertex-cover algorithm |4). 

Section|7]concludes by discussing the relation of the analysis here to the primal-dual and local-ratio methods. As a 
rule of thumb, greedy approximation algorithms can generally be analysed naturally via the primal-dual method, and 
sometimes even more naturally via the local-ratio method. The results here extend many primal-dual and local-ratio 
results. We conjecture that it is possible, but unwieldy, to recast the analysis here via primal-dual. It can be recast as a 
local-ratio analysis, but in a non-traditional form. 
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For distributed implementations of | Alg. l| running in 0(log n) rounds (or 0(log n) for A = 2), see 

We assume throughout that the reader is familiar with classical covering problems 115 11341 as well as classical 
online paging and caching problems and algorithms lfl2l . 

Alternatives to ^-Approximation: log-Approximations, Randomized Online Algorithms. In spite of extensive 
work, no (2 — ^-approximation algorithm for constant e > is yet known for vertex cover B28I32I7I44I27I29I2 1 1371 . 
For small A, it seems that Zi-approximation may be the best possible in polynomial time. 

As an alternative when A is large, many covering problems considered here also admit 0(log Z\)-approximation 
algorithms, where A is the maximum number of constraints in which any variable occurs. Examples include a greedy 
algorithm for set cover B35I42I171 (1975) and greedy 0(logmaxj J^i ^4ij)-approximation algorithms for CIP with 
{0, l}-variables and integer A 0221241 (1982). Srinivasan gave 0(logZ\)-approximation algorithms for general CIP 
without variable upper bounds 1491501 (2000); these were extended to CIP with variable upper bounds by Kolliopoulos 
et al. [38 J (2005). (The latter algorithm solves the CIP relaxation with KC inequalities, then randomly rounds the solu- 
tion.) The class of 0(log(Z\))-approximation algorithms for general CIP is not yet fully understood; these algorithms 
could yet be subsumed by a single fast greedy algorithm. 

For most online problems here, no deterministic online algorithm can be better than Z\-competitive. But many on- 
line problems admit better-than-Z\-competitive randomized algorithms. Examples include rent-or-buy 1361401 . paging 
H23I43I . weighted caching 121141 . connection caching iflSl . and file caching J3|. Some cases of online monotone cov- 
ering (e.g. vertex cover) are unlikely to have better- than- Z\-competitive randomized algorithms. It would interesting to 
classify which cases admit better-than-Zi-competitive randomized online algorithms. 



4 (30l also mentions a 2-approximation for probabilistic vertex cover, without details. 



greedy algorithm for monotone covering (monotone constraints C, submodular objective c) alg. 1 

output: feasible ieS (V5 G C), ^-approximately minimizing c(x) (see Thm.[T|i 

1 . Let x <— 0. . . . A = maxgge vars(S') | is the max # ofvars any constraint depends on 

2. While 3 S eC such that x & S, do step(x, S) for any S such that x & S. 

3. Return x. . . . or p(x) in the case of restricted variable domains; see the introduction. 
subroutine step c (x, S): . . . makes progress towards satisfying x G S. 

1. Choose a scalar step size j3 > 0. ... choose (3 subject to restriction in Thm.\l\ 

2. For j G vars(S'), let x'j G IR + U {oo} be the maximum such that raising Xj to x'j would raise c(x) by at most (3. 

3. For j G vars(S'), let Xj <— x'j. . . . i/c is linear, then x'j = Xj + (3/cjfor j G vars(5). 



2 The Greedy Algorithm for Monotone Covering ( |Alg. 1) 



Fix an instance of monotone covering. Let vars(S') denote the variables in x that constraint x E S depends on, so that 
A = m&xsec |vars(5)|. 

The algorithm ( |Alg. l\ starts with x = 0, then repeats the following step until all constraints are satisfied: choose 
any unmet constraint and a step size f3 > 0; for each variable Xj that the constraint depends on (j G vars(S)), raise 
that variable so as to increase the cost c(x) by at most (3. (The step increases the total cost by at most A(3) 

The algorithm returns x (or, if variable domains are restricted as described in the introduction, p{x)). 

The algorithm returns a Z\-approximation, as long as each step size (3 is at most the minimum cost to optimally 
augment x to satisfy S, that is, min{c(i;) — c(x) : x G S, x > x}. Denote this cost distance c (x, S). Also, let 
residual c (a;) be the residual cost of a: — the minimum cost to augment x to full feasibility, i.e., distance c (a;, Ds^cS). 

Theorem 1. For monotone covering, the greedy algorithm \Alg. 7| returns a A-approximate solution as long as it 
chooses step size (3 < distance c (x, S) in each step (and eventually terminates). 

Proof. First, a rough intuition. Each step starts with x ^ S. Since the optimal solution x* is in S and S is monotone, 
there must be at least one k G vars(5) such that Xk < x* k . By raising all Xj for j G vars(S'), the algorithm makes 
progress "covering" at least that coordinate x% of x*. Provided the step increases Xk to x' k < x\, the cost incurred can 
be charged to a corresponding portion of the cost of x* k (intuitively, to the cost of the part of x* k in the interval [xk, x' k ] ; 
formally, to the decrease in the residual cost from increasing Xk, provably at least (3). Since the step increases c(x) by 
at most (3 A, and results in a charge to c(x* ) of at least j3, this proves the Z\-approximation. 

Here is the formal proof. By inspection (using that c is submodular) each step of the algorithm increases c(x) by at 
most/3|vars(S')| < (3A. We show that residual(x) decreases by at least /3, so the invariant c(x)/Z\ + residual(a;) < opt 
holds, proving the theorem. 

Let x and x' , respectively, be x before and after a given step. Let feasible x* > x be an optimal augmentation 
of x to full feasibility, so c(x*) — c(x) = residual(a;). Let x /\ y (resp. x V y) denote the component-wise minimum 
(resp. maximum) of x and y. By the submodularity of c, c(x') + c(x*) > c(x' V x*) + c(x' Ax*). (Equality holds if c 
is separable (e.g. linear).) 

Rewriting gives [c(x*) — c(x)] — [c(x' V x*) — c(x')] > c(x' A a;*) — c(x). 

The first bracketed term is residual(a;). The second is at least residualfV), because x* V x' > x' is feasible. Thus, 

residualfx) — residual(a;') > c(x' A x*) — c(x). (1) 

To complete the proof, we show the right-hand side of (QJ is at least (3. 

Case 1. Suppose x' k < x* k for some k G vars(S'). (In this case it must be that increasing Xk to x' k costs /3.) 

Let y be x with just Xk raised to x' k . Then c(x' Ax*) > c(y) = c(x) + (3. 
Case 2. Otherwise x' A x* G S, because x* G S and x'j > x* for all j G vars(S'). Also x' Ax* > x. 

Thus, the right-hand side of ([T]) is at least distance c (a;, S). By assumption this is at least (3. □ 

Choosing the step size, (3. In a sense, the algorithm reduces the given problem to a sequence of subproblems, each 
of which requires computing a lower bound on distance^, S) for the current x and a given unmet constraint S. To 
completely specify the algorithm, one must specify how to choose (3 in each step. 

Thm.Q]allows /3 to be small. At a minimum, distance^, S) > when x $ S, so one can take (3 to be infinitesimal. 
Then |Alg~T| raises Xj for j G vars(5 f ) continuously at rate inversely proportional to dc(x)/dxj (at most until x G S). 

Another, generic, choice is to take (3 just large enough to satisfy x G S. This also satisfies the theorem: 



subroutine stepsize c (x, S(I, A i: u, &j)) (for CMIP) alg. 2 

1 . Order I = (J1J2, ■ ■• ,jk)by decreasing A vj . ...SoA lh > A iJ2 > ■■■> A ijk . 
Let J = J(x, S) contain the minimal prefix of / such that x $ S(J,Ai,u,bi). 

Let S' denote the relaxed constraint S(J,Ai,u,bi). 

2. Let U = U(x, S) = {j : Xj > Uj] Aij > 0} contain the variables that have hit their upper bounds. 

3. Let (3j = minj e j_[/(l — Xj + [xj\)cj be the minimum cost to increase any floored term in S". 

4. Let f3j = min e j_ u Cjb^/Aij, where b\ is the slack (b\ minus the value of the left-hand side of S"), 
be the minimum cost to increase the sum of fractional terms in S' to satisfy S'. 

5. Return j3 = min{/3j, f3-j}. 



Observation 1 Let (3 be the minimum step size so that step(cc, 5") brings x into S. Then f3 < distance c (a;, S). 

Thm.[T]can also allow (3 to be more than large enough to satisfy the constraint. Consider mm{xi + 2x2 : x £ S} 
where S = {x : x\ + X2 > 1}- Start with x = 0. Then distance(;r, S) = 1. The theorem allows f3 = 1. A single step 
with (3 = 1 gives x\ = 1 and x-2 = 1/2, so that x\ + x^ = 3/2 > 1. 

Generally, one has to choose (3 small enough to satisfy the theorem, but large enough so that the algorithm doesn't 
take too many steps. The computational complexity of doing this has to be addressed on a per-application basis. 
Consider a simple subset-sum example: min{c • x : x G S} where the single constraint S contains x > such that 
J2j c j min(l, [xj\) > 1. Computing distance(0, S) is NP-hard, but it is easy to compute a useful /?, for example 
j3 = mmj; X <i Cj(l — Xj). With this choice, the algorithm will satisfy S within A steps. 

As a warm-up, here are linear-time implementations for facility location, set cover, and vertex cover. 

Theorem 2. For (non-metric) facility location, set cover, and vertex cover, the greedy A- approximation algorithm 
( |A/g. 1\ has a linear-time implementation. For facility location A is the maximum number of facilities that might serve 
any given customer. 

Proof. Formulate facility location as minimizing the submodular objective J^ fj max^ Xij + J^ij dijXij subject to, 
for each customer i, Y^jqnU) l x v\ — 1 (where j S N(i) if customer i can use facility j)0 

The implementation starts with all x%j = 0. It considers the customers i in any order. For each it does the following: 
let (3 — mmj £ N(i)[dij + fj(l — max^ Xvj)\ (the minimum cost to raise x%j to 1 for any j £ N(i)). Then, for 
each j £ N(i), raise Xij by min[/3/dy, (j3 + fj max^ Xi'j)/(dij + fj)] (just enough to increase the cost by (3). By 
maintaining, for each facility j, max^ xyj, the above can be done in linear time, 0(^2 i \N(i)\). 

Vertex cover and set cover are the special cases when dij =0. □ 

3 Nearly Linear- Time Implementation for Covering Mixed Integer Linear Programs 



Theorem 3. For CMIP (covering mixed integer linear programs with upper bounds), the greedy algorithm ( |AZg. 1 1 
can be implemented to return a A- approximation in 0(N log A) time, where A is the maximum number of non-zeroes 
in any constraint and N is the total number of non-zeroes in the constraint matrix. 

Proof (sketch). Fix any CMIP instance min{c • x : x £ R"; Ax > b; x < u; Xj £ Z (j £ /)}. 
Model each constraint Aix > bi using a monotone constraint S £ C of the form 

y^ A^ [mm(xj, Uj)\ + y^ A tj min(x J , Uj) > h S(I,A h u,bi) 

jei je j 

where set / contains the indexes of the integer variables. 

Given such a constraint S and an x S, the subroutine stepsize(x, S) ( |Alg. 2\ computes a step size (3 satisfying 
Thm. Q]as follows. Let S', J, U, (3j, (3j, and (3 be as in |Alg. 2| That is, S' = S(J,Ai,u,bi) is the relaxation of 
S(I, Ai, u, bi) obtained by relaxing the floors in S (in order of increasing Aij) as much as possible, while maintaining 
x £" 5"; J C I contains the indices j of variables whose floors are not relaxed. Increasing x to satisfy S' requires (at 
least) either: (i) increasing J2je J-u ^v l x j\ > at cost at ^ eas ' : 0J' or (^) increasing J2je7-u ^ijXj by at least the slack 
b'i of the constraint 5", at cost at least (3-j. Thus, distance^, S) > distance(a;, 5") > min{/3j, (3-j] — (3. This choice 
satisfies Thm.Q] so the algorithm returns a Z\-approximate solution. 

5 The standard ILP is not a covering ILP due to constraints xtj < yj . The standard reduction to set cover increases A exponentially. 



Lemma 1. For any S, \Alg. l\ calls step(a;, S) with j3 = stepsize(a;, S) (from \Alg~^2\ at most 2|vars(S') | times. 

Proof (sketch). Let j be the index of the variable Xj that determines (3 in the algorithm (j3j in case (i) of the previous 
proof, or /3j in case (ii)). The step increases Xj by (3/cj. This may bring Xj to (or above) its upper bound Uj. If not, 
then, in case (i), the left-hand side of S' increases by at least Aij, which, by the minimality of J(x) and the ordering 
of /, is enough to satisfy S' . Or, in case (ii), the left-hand side increases by the slack b[ (also enough to satisfy S'). 
Thus the step either the increases the set U(x) or satisfies S", increasing the set J(x). □ 

The naive implementations of stepsize() and step() run in time 0(|vars(5)|) (after the A+j's within each con- 
straint are sorted in preprocessing). By the lemma, with this implementation, the total time for the algorithm is 
0(J2s |vars(S')| 2 ) < O(NA). By a careful heap-based implementation, this time can be reduced to 0(N log A) 
(proof omitted). □ 

4 (Two-Stage) Probabilistic Monotone Covering 

An instance of probabilistic monotone covering is specified by an instance (c, C) of monotone covering, along with 
activation probabilities ps for each constraint S G C and a non-decreasing, submodular first-stage objective W. The 
first stage requires the algorithm to commit to a vector i s GS for each S E C. In the second stage, the algorithm must 
pay to satisfy the activated constraints, where each constraint S is independently activated with probability p$. The 
algorithm pays c(x), where x is the minimal vector such that x > x° for each active S (xj = max{x^ : S active}). 
The problem is to choose the first-stage vectors to minimize the first-stage cost W(x s : S E C) plus the expected 
second-stage cost, E[c(x)]. This (expected) cost is submodular as long as c is. 

Observation 2 Probabilistic monotone covering reduces to monotone covering. 

Probabilistic CMIP is the special case where W is linear and the pair (c, C) define a CMIP. 

For example, consider a two-stage probabilistic facilities location problem specified by first-stage costs f 1 , d 1 , an 
activation probability pi for each customer i, and second-stage costs f 2 ,c 2 . The algorithm assigns to each customer 
i a facility j(i) E N(i) (those that can serve i), by setting a;y(») = 1 (satisfying constraints ^2 ieN / i \ \%tj\ > 1), 
then paying the first-stage cost ^\ f] max, Xij + J^ij ^\j x ij- Then, each customer i is activated with probability p^. 
Facilities assigned to activated customers are opened by setting Xij = 1 if Xij = 1 and i is active. The algorithm then 
pays the second-stage cost J^ / 2 max; Xij + J^ij d%^ij- The algorithm should minimize its total expected payment. 
The degree A = max; \N(i) | is the maximum number of facilities that any given customer is eligible to use. 

Theorem 4. For probabilistic CMIP, 

(a) The greedy A-approximation algorithm can be implemented to run in 0(NA log A) time, where A is the 
maximum number of constraints per variable and N = X^SeC l vars ('S') I W the input size. 

(b) When p = 1, it can be implemented to run in time 0(N log A) (generalizes CMIP and facilities location). 

Proof (sketch). Let X = (x s )s£C be the matrix formed by the first-stage vectors. Let random variable x be as 
described in the problem definition (xj = max{s| : S active}), so the problem is to choose X subject to x s E S for 
each S to minimize C(X) = W ■ X + E[c ■ x]. This function is submodular, increasing, and continuous in X. 

To satisfy Thm. [TJ the subroutine step(X, S) must compute the step size j3 to be at most distance(X, S) (the 
minimum possible increase in C(X) required to satisfy S). For a given X and S, have step(X, S) compute /3 as 
follows. For a given X, the rate at which increasing xf would increase C{X) is 



w j 



j 
s 



Cj Pr[xf = Xj] = w^ + Cjps Y[{1 - Pr ■ xf > xf,j E vars(i?)}. 



This rate does not change until Xj reaches tj — minlxj* : xf > Xj ,j E vars(i?)}. 

Take (i = min(/3 t , stepsize c / (x s , S)), where fi t = mh\{{tj — £?)c'- : j E vars(S')} is the minimum cost to bring 
any X s - to its threshold, and stepsize() is the subroutine from Section[3] using the (linear) cost vector c' defined above. 
This (3 is a valid lower bound on distance(X, S), because fit is a lower bound on the cost to bring any x^ to its next 
threshold, while stepsize c / (x s , S) is a lower bound on the cost to satisfy S without bringing any x^ to its threshold. 

If step(X, S) uses this j3, the number of steps to satisfy S is at most 0(\vars(S)\A). Each step either (i) makes 
some x^ reach its next threshold (and each x^ crosses at most A thresholds), or (ii) increases the number of "floored" 



variables or increases the number of variables at their upper bounds (which by the analysis of stepsize() from Sec- 
tion|3] can happen at most 2|vars(«S')| times). Thus, the total number of steps is OQ2s \ vars (S)\A), that is, O(NA). 
(Implementation details needed to achieve amortized time 0(logA) per step are omitted.) This completes the proof 
sketch for part (a). 

For part (b) of the theorem, note that in this case the product in the equation for C s (X) is 1 if x^ = max^j x^ 
and otherwise. Each variable has at most one threshold to reach, so the number of calls to step(X, S) is reduced to 
0(|vars(5)|). This allows an implementation in total time 0(N log A). □ 

5 Online Monotone Covering and Caching with Upgradable Hardware 

Recall that in online monotone covering, each constraint S E C is revealed one at a time; an online algorithm must 



raise variables in x to bring x into the given S, without knowing the remaining constraints. Alg. 1 (with, say, step(cc, S) 
taking/3 just large enough to bring x £ S; see Observation Q]) can do this, so it yields a zi-competitive online algo- 
rithmO 



Corollary 1. The greedy algorithm (Alg. 1 \ gives a A-competitive online monotone covering algorithm. 



Example: generalized connection caching. As discussed in the introduction (following the formulation of weighted 
caching as online set cover from J2|) this result naturally generalizes a number of known results for paging, weighted 
caching, file caching, connection caching, etc. To give just one example, consider connection caching. A request 
sequence r is given online. Each request r t = (u t ,Wt) activates the connection (u t ,Wt) (if not already activated) 
between nodes u t and wt- If either node has more than k active connections, then one of them other than r t (say r s ) 
must be closed at cost cost(r s ). Model this problem as follows. Let variable x t indicate whether connection r t is closed 
before the next request to r t after time t, so the total cost is ^\ cost(r t )xt. For each node u and each time t, for any 
(k + l)-subset Q C {r s : s < t; u E r s }, at least one connection r s € Q — {r t } (where s is the time of the most 
recent request to r s ) must have been closed, so the following constraint] is met: ^ r e n_s r \ [x s \ > 1. 

Corollary Q] gives the following /c-competitive algorithm for online connection caching. When a connection re- 
quest (u, w) occurs at time t, the connection is activated and x t is set to 0. If a node, say u, has more than k active 
connections, the current x violates the constraint above for the set Q containing it's active connections. Node u ap- 
plies the step() subroutine for this constraint: it raises x s for all the connections r s E Q — {r t } at rate l/cost(r s ) 
simultaneously, until some x s reaches 1. It closes any such connection r s . 

Remark on k/(k — h + ^-competitiveness. The classic ratio of k/(k — h + 1) (versus opt with cache size h < k) 
can be reproduced in such a setting as follows. For any set Q as described above, opt must meet the stronger constraint 
J2 r eQ-(r > l Xs \ — k — h + l.In this scenario, the proof of Thm. Q] extends to show a ratio of k/(k — h + 1) (use 
that the variables are {0, 1}, so there are at least k — h + 1 variables Xj such that Xj < x*). 

Upgradable online problems. Standard online caching problems model only the caching strategy. In practice other 
parameters (e.g., the size of the cache, the speed of the CPU, bus, network, etc.) must also be chosen well. In upgrad- 
able caching, the algorithm chooses not only the caching strategy, but also the hardware configuration. The hardware 
configuration is assumed to be determined by how much has been spent on each of some d components. The configu- 
ration is modeled by a vector y E IR + , where y^ has been spent so far on component i. 

In response to each request, the algorithm can upgrade the hardware by increasing the y/s. Then, if the requested 
item r t is not in cache, it is brought in. Then items in cache must be selected for eviction until the set Q of items 
remaining in cache is cachable, as determined by some specified predicate cachable t (Q, y). The cost of evicting an 
item r s is specified by a function cost(r s , y). 

The cachableQ predicate and cost() function can be specified arbitrarily, subject to the following restrictions. 
Predicate each a ble t ((5, y) must be non-decreasing in y (upgrading the hardware doesn't cause a cachable set to become 
uncachable) and non-increasing with Q (any subset of a cachable set is cachable). The function cost(r s , y) must be 



6 If the cost function is linear, in responding to S this algorithm needs to know S and the values of variables in S and their cost 
coefficients. For general submodular costs, the algorithm may need to know not only S, but all variables' values and the whole 
cost function. 

7 This presentation assumes that the last request must stay in cache. If not, don't subtract {r t } from Q in the constraints. The 
competitive ratio goes from k to k + 1. 



non-increasing in y (upgrading the hardware doesn't increase the eviction cost of any item). To model (standard, 
non-upgradable) file caching, take cachable t (Q, y) to be true if J2 r go s ' ze ( r s) ^ &• 

In general, the adversary is free to constrain the cache contents at each step t in any way that depends on t and the 
hardware configuration, as long as upgrading the cache or removing items does not make a cachable set uncachable. 
Likewise, the cost of evicting any item can be determined by the adversary in any way that depends on the item and 
the hardware configuration, as long as upgrading the configuration does not increase any eviction cost. This gives a 
great deal of flexibility in comparison to the standard model. For example, the adversary could insist (among other 
constraints) that no set containing both of two (presumably conflicting) files can be cached. Or, upgrading the hardware 
could reduce the eviction cost of some items arbitrarily, even to zero. 

The optimal cost is achieved by choosing an optimal hardware configuration at the start, then handling all caching 
decisions optimally. To be competitive, an algorithm must also choose a good hardware configuration: an algorithm 
is Z\-competitive if its total cost (eviction cost plus final hardware configuration cost, J^. y,) is at most A times the 
optimum. (Naturally, when the algorithm evicts an item, it pays the eviction cost in its current hardware configuration. 
Later upgrades do not reduce earlier costs.) 

Next we describe how to model the upgradable problem via online monotone covering with degree A = k + d, 
where k is the maximum number of files ever held in cache and d is the number of hardware components. This gives 
a simple (k + d)-competitive online algorithm for upgradable caching. 

Theorem 5. Upgradable caching has a (d + k)-competitive online algorithm, where d is the number of upgradable 
components and k is the maximum number of files that can be held in the cache. 

Proof (sketch). Let variable j/j for i = 1, . . . , d denote the amount invested in component i, so that the vector y gives 
the current hardware configuration. Let x t be the cost (if any) incurred for evicting the ith requested item r t at any 
time before its next request. The total final cost is J2i Vi + J2t x t- At time t, if some subset Q C {r s : s < t} of the 
items is not cachable, then at least one item r s £ Q — {r t } (where s is the time of the most recent request to r s ) must 
have been evicted, so the following constraint is met: 

cachable t (Q,y) or Y^r s £Q-{r t }l x */ a3St ( r <»v)} ^ L s t(Q) 

The restrictions on cachable and cost ensure that this constraint is monotone in x and y. 

The greedy algorithm initializes y = 0, x = and Q = 0. It caches the subset Q of requested items r s with 
x s < cost(r s , y). To respond to request r t (which adds r t to the cache if not present), the algorithm raises each y. L 
and each x s for r s in Q — {r t } at unit rate. It evicts any r s with a; s > cost(r s , y), until cachable f (Q, y) holds for the 
cached set Q. The degreq^ A is the maximum size of Q — {r t }, plus d for y. □ 

This result generalizes easily to "upgradable" monotone caching, where investing in some d components can relax 
constraints or reduce costs. 

Restricting groups of items (such as segments within files). The http protocol allows retrieval of segments of 
files. To model this in this setting, consider each file / as a group of arbitrary segments (e.g. bytes or pages). Let xt 
be the number of segments of file r t evicted before its next request. Let c(xt) be the cost to retrieve the cheapest Xt 
segments of the file, so the total cost is J^t c ( x t)- Then, for example, to say that the cache can hold at most k segments 
total, add constraints of the form (for appropriate subsets Q of requests) X) s gq s ' ze ( r s) — \_ x s\ < k (where size(r s ) 
is the number of segments in r s ). When the greedy algorithm increases x s to x' s , the online algorithm evicts segments 
[x s \ + 1 through [x' s \ of file r s (assuming segments are ordered by cheapest retrieval). 

Generally, any monotone restriction that is a function of just the number of segments evicted from each file (as 
opposed to which specific segments are evicted), can be modeled. (For example, "evict at least 3 segments of r s or 
at least 4 segments from r t ": [x s /3\ + [xt/4:\ > 1.) Although the caching constraints constrain file segments, the 
competitive ratio will be the maximum number of files (as opposed to segments) referred to in any constraint. 



1 The algorithm enforces just some constraints St(Q); A is defined w.r.t. the problem defined by those constraints. 



subroutine rstep c (a;, S) alg. 3 

1 . Fix an arbitrary probability pj G [0, 1] for each j G vars(S'). . . . taking each Pj = 1 gives \Alg~7 

2. Choose a scalar step size (3 > 0. 

3. For j G vars(5) with pj > 0, let Xj be the max. s.t. raising Xj to Xj would raise c(x) by < (3/pj. 

4. For j G vars(5) with pj > 0, with probability pj, let Xj <— Xj. ... these events can be dependent if desired! 



subroutine stateless-rstep c (a;, S, U): ■ ■ ■ do rstep, and keep each Xj in its (countable) domain Uj • • ■ alg. 4 

1 . For j G vars(S'), let Xj — min{z G Uj\z > Xj } (or Xj = Xj if the minimum is undefined). 

2. Let otj be the increase in c(x) that would result from increasing just Xj to Xj. 

3. Do rstep c (x, S), choosing any (3 G (0, mio,- ay] and pj = [3/a.j (or pj — if Xj = Xj). 



6 Randomized Variant of Alg. 1 and Stateless Online Algorithm 



This section describes a randomized, online generalization of Alg. 1 It has more flexibility than Alg. 1 in how it 
increases variables. This can be useful, for example, in distributed settings, in dealing with numerical precision issues, 
and in obtaining stateless online algorithms (an example follows). 

The algorithm is | Alg. 1| modified to call subroutine rstep c (a;, S) (shown in | Alg. 3| instead of step c (x, S). The 
subroutine has more flexibility in incrementing x. Its step-size requirement is a bit more complicated. 

Theorem 6. For monotone covering suppose the randomized greedy algorithm terminates, and, in each step, (3 is at 
most min{£ , [c(x ] p x) — c(x)\ :i>i;i£S}, where x ] p x is a random vector obtained from x by raising Xj to Xj 
with probability pj for each j G vars(S'). Then the algorithm returns a A-approximate solution in expectation. 

If the objective c(x) is linear, the required upper bound on j3 above simplifies to distance^ (a;, S) where c' = PjCj. 

Proof (sketch). We claim that, in each step, the expected increase in c(x) is at most A times the expected decrease in 
residual(a;). This implies (by the optional stopping theorem) that S[c(xftnai)] < Ax residual(O), proving the theorem. 
Fix any step starting with a given x. Let (r.v.) x' be x after the step. Fix feasible x* > x s.t. residua I (x) = 
c(x*) — c(x). Inequality (Q~|) holds; to prove the claim we show E x <[c{x' A x*) — c(x)] > (3. Since x* > x and 
x' = x t P X, this is equivalent to E[c(x | p X) — c(x)\ > (3. 

(Case 1.) Suppose X^ < x* k for some k G vars(S') with pk > 0. Let y be obtained from x by raising just Xk to X^. 
Then with probability pk or more, c(x |p X) > c(y) > c(x) + (3/pk- Thus the expectation is at least (3. 
(Case 2.) Otherwise, Xj > x* for all j with Pj > 0. Then E[c(x T P X) - c(x)] > E[c(x f P x*) - c(x)}. Since 
x* > x and x* G S, this is at least [3 by the assumption on (3. □ 

A stateless online algorithm. As described in the introduction, when the variables have restricted domains (xj G Uj), 
| Alg. 1 1 constructs x and then "rounds" x down to fj,(x). In the online setting, [Alg. 1| maintains x as constraints are 
revealed; meanwhile, it uses /i(x) as its current online solution. In this sense, it is not stateless. A stateless algorithm 
can maintain only one online solution, each variable of which should stay in its restricted domain. 

Next we use Thm.|6]to give a stateless online algorithm. The algorithm generalizes the Harmonic fc-server algo- 
rithm as it specializes for paging and caching J47l , and Pitt's weighted vertex cover algorithm J4). Given an unsatisfied 
constraint S, the algorithm increases each Xj for j G vars(S') to its next largest allowed value, with probability in- 
versely proportional to the resulting increase in cost. (The algorithm can be tuned to increase just one, or more than 
one, Xj. It repeats the step until the constraint is satisfied.) 

Formally, the stateless algorithm is the randomized algorithm from Thm. [6] but with the subroutine rstep c (x, S) 



replaced by stateless-rstep c (a;, S, U) (in Alg. 4 1, which executes rstep c (cc, S) in a particular way. (A\ technicality: if 



^ Uj, then Xj should be initialized to min Uj instead of 0. This does not affect the approximation ratio.) 

Theorem 7. For monotone covering with discrete variable domains as described above, there is a stateless random- 
ized online A- approximation algorithm. 

Proof (sketch). By inspection stateless-rstep c (a;, S, U) maintains each Xj G Uj. 

We show that stateless-rstep c (x, S, U) performs rstep c (x, S) in a way that satisfies the requirement on (3 in 
Thm. [6] Let x be as in the proof of Thm. [6] with the added restriction that each Xj G Uj. Since x G S but x ^ S, there 
is a k G vars(S') with Xk > Xk- Since Xk G Uk, the choice of Xk ensures Xk > Xk- Let y be obtained from x by 
raising Xk to Xk- Then, E[c(x ] p x) — c(x)\ > Pk[c(y) — c(x)] = PkCtk = P, satisfying Thm. [6] □ 



7 Relation to Primal-Dual and Local-Ratio Methods 

Primal-Dual. Here we speculate about how Thm.[TJmight be cast as a primal-dual analysis. Given a vector v, consider 
its "shadow" s(v) = {x : 3jXj > Vj}. Any monotone set S is the intersection of the shadows of its boundary points: 
S = C\ ve gs s ( v )- Thus, any monotone covering instance can be recast to use only shadow sets for constraints. Any 
shadow set s(v) is of the form s(v) = {x : J2jl x j/ V j\ — 1}' a f° rm similar to that of the CMIP constraints 
S(I,Ai,u,bi,d) in Section [3] We conjecture that the Knapsack Cover (KC) inequalities from [151 for CIP can be 
generalized to give valid inequalities with integrality gap A for constraints of this form. (Indeed, the result in Section|3] 
easily extends to handle such constraints.) This could yield an appropriate relaxation on which a primal-dual analysis 
could be based. 

For even simple instances, generating a ^-approximate primal-dual pair for the greedy algorithm here requires a 
"tail -recursive" dual solution implicit in some local-ratio analyses J9), as opposed to the typical forward-greedy dual 
solution^ Even if the above program (extended to non-linear cost functions ! ) can be carried out, it seems likely to lead 
to a less intuitive proof than that of Thm. [TJ 

Local-Ratio. The local-ratio method has most commonly been applied to problems with variables Xj taking values 
in {0, 1} and with linear objective function c • x (see II7I4I9I51 : for one exception, see [8|). In these cases, each step 
of the algorithm is typically interpreted as modifying the problem by repeatedly reducing selected objective function 
weights Cj by some (3. At the end, the x, where Xj is raised from to 1 if Cj = 0, gives the solution. At each step, the 
weights to lower are chosen so that the change must decrease OPT's cost by at least j3, while increasing the cost for 
the algorithm's solution by at most A/3. This guarantees a Zi-approximate solution. 

In contrast, recall that Alg. 1| raises selected x/s fractionally by (3/cj. At the end, Xj is rounded down to \xj\. 



Each step costs {3 A, but reduces the residual cost by at least (3. 

For problems with variables Xj taking values in {0, 1} and with linear objective function c • x, Alg. 1 can be given 
the following straightforward local-ratio interpretation. Instead of raising Xj by /3/cj, reduce Cj by (3. At the end, 
instead of setting Xj to \_Xj\ , set Xj = 1 if Cj = 0. With this reinterpretation, a standard local-ratio analysis applies. 

To understand the relation between the two interpretations, let c' denote the modified weights in the above rein- 
terpretation. The reinterpreted algorithm maintains the following invariants: Each modified weight c' stays equal to 
Cj(l — Xj) (for c and x in the original interpretation; this is the cost to raise Xj the rest of the way to 1). Also, the 
residual cost residua I (x) in the original interpretation equals (in the reinterpreted algorithm) the minimum cost to solve 
the original problem but with weights c! . 

This local-ratio reinterpretation is straightforward and intuitive for problems with {0, 1} variables and a linear 
objective. But for problems whose variables take values in more general domains, it does not extend cleanly. For 
example, suppose a variable Xj takes values in {0, 1,2,..., u}. The algorithm cannot afford to reduce the weight Cj, 
and then, at termination, set Xj to u for j with Cj = (this can lose a factor of u in the approximation). Instead, one has 
to reinterpret the modified weight c' as a vector of weights c' : {1, . . . , u} — > IR + where c'Ai) is the cost to raise Xj 
from max{xj, i — 1} to mm{xj, i} (initially c'Ai) = Cj). When the original algorithm lowers Xj by (3/cj, reinterpret 
this as leaving Xj at zero, but lowering the non-zero c' (i) with minimum i by (3. At the end, take Xj to be the maximum 
i such that c' (i) = 0. We show next that this approach is doable (if less intuitive) for monotone covering. 

At a high level, the local-ratio method requires only that the objective be decomposed into "locally approximable" 
objectives. The common weight-reduction presentation of local ratio described above gives one decomposition, but 
others have been used. A local-ratio analysis for an integer programming problem with non-{0, 1} variable domains, 
based on something like residuals), is used in (8). Here, the following decomposition (different than |[8J) works: 

Lemma 2. Any algorithm returns a A-approximate solution x provided there exist {c*} and r such that 

(a) for any x, c(x) = e(0) + r(x) + J2t=i c *( x )> 

(b) for all t, and any x and feasible x*, c (x) < c t (x*)A, 

(c) the algorithm returns x such that r(x) = 0. 



9 For example, consider min{xi + x-i + xz : x\ + X2 > 1, Xi + %3 > 2}. If the greedy algorithm does the constraints in either 
order and chooses (3 maximally, it gives a solution of cost 4. In the dual max{j/i2 + 2j/i3 : 2/12 + yi3 < 1}, the only way 
to generate a solution of cost 2 is to set j/13 = 1 and yi2 = 0. If the primal constraint for 1/12 is considered first, 3/12 cannot 
be assigned a non-zero value. Instead, one should consider the dual variables for constraints for which steps were done, in the 
reverse order of those steps, raising each until a constraint is tight. 



Proof. Let x* be an optimal solution. Applying properties (a) and (c), then (b), then (a), 

c(x) = c(0)+£f =1 c*(a;) < c (0)Z\ + X;f = iC*(x*)Z\ + r{x*)A = c(x*)A. □ 

Next we describe how to use the proof of Thm. Q](based on residual cost) to generate such a decomposition. 

Let distance(a;, y) = c(x V y) — c(x) (the cost to raise x to dominate y). 

For any x, define c*(a;) = distance(x' _1 , x) — distance(x*, x), where x* is Alg. 1 s x after t calls to step(). 

Define r(x) = distance(x T , x), where x T is the algorithm's solution. 

For linear c note c*(x) = J^ • c-,- | [0, Xj] D [i'~ , a;*-] |, the cost for x "between" x f_1 and x t . 

Lemma 3. These c* and r have properties (a-c) from Lemma\^ so the algorithm gives a A-approximation. 

Proof. Part (a) holds because the sum in (a) telescopes to distance(0, x) — distance(.T T , a;) = c(x) — c(0) — r(x). 
Part (c) holds because the algorithm returns x T , and r(x T ) = distance(x T , x T ) = 0. 
For (b), consider the tth call to step(). Let /3 be as in that call. 

The triangle inequality holds for distance(), so, for any x, c*(x) < distance c (x t_1 , x f ) = c(x f ) — c(.t* _1 ). 
As proved in the proof of Thm. Q] c(x r ) — c(a; t_1 ) is at most (3 A. 

Also in the proof of Thm. Q] it is argued that j3 < distance(x* _1 , DsecS) — distance(x*, PisecS). 
By inspection that argument holds for any x* G OsecS, giving j3 < distance(x* _1 ,x*) — distance(a;*, x*). 
The latter quantity is c\x*). Thus, c\x) < [3A < c t (x*)A. □ 
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